AITopics | bayes regret

Diffusion Models Meet Contextual Bandits

Neural Information Processing SystemsJun-23-2026, 02:40:40 GMT

Efficient online decision-making in contextual bandits is challenging, as methods without informative priors often suffer from computational or statistical inefficiencies. In this work, we leverage pre-trained diffusion models as expressive priors to capture complex action dependencies and develop a practical algorithm that efficiently approximates posteriors under such priors, enabling both fast updates and sampling. Empirical results demonstrate the effectiveness and versatility of our approach across diverse contextual bandit settings.

artificial intelligence, diffusion model, machine learning, (19 more...)

Neural Information Processing Systems

Country: North America (0.27)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)

Add feedback

Finite-Time Logarithmic Bayes Regret Upper Bounds

Neural Information Processing SystemsApr-24-2026, 19:33:57 GMT

We derive the first finite-time logarithmic Bayes regret upper bounds for Bayesian bandits. In a multi-armed bandit, we obtain O(c logn)and O(ch log2 n)upper bounds for an upper confidence bound algorithm, where ch and c are constants depending on the prior distribution and the gaps of bandit instances sampled from it, respectively. The latter bound asymptotically matches the lower bound of Lai (1987). Our proofs are a major technical departure from prior works, while being simple and general. To show the generality of our techniques, we apply them to linear bandits. Our results provide insights on the value of prior in the Bayesian setting, both in the objective and as a side information given to the learner. They significantly improve upon existing O( n)bounds, which have become standard in the literature despite the logarithmic lower bound of Lai (1987).

bandit, data mining, machine learning, (22 more...)

Neural Information Processing Systems

Country: North America > United States > Texas (0.14)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Data Science > Data Mining > Big Data (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms

Neural Information Processing SystemsMar-21-2026, 10:20:26 GMT

Hierarchical Bayesian bandit refers to the multi-task bandit problem in which bandit tasks are assumed to be drawn from the same distribution. In this work, we provide improved Bayes regret bounds for hierarchical Bayesian bandit algorithms in the multi-task linear bandit and semi-bandit settings. For the multi-task linear bandit, we first analyze the preexisting hierarchical Thompson sampling (HierTS) algorithm, and improve its gap-independent Bayes regret bound from $O(m\sqrt{n\log{n}\log{(mn)}})$ to $O(m\sqrt{n\log{n}})$ in the case of infinite action set, with $m$ being the number of tasks and $n$ the number of iterations per task. In the case of finite action set, we propose a novel hierarchical Bayesian bandit algorithm, named hierarchical BayesUCB (HierBayesUCB), that achieves the logarithmic but gap-dependent regret bound $O(m\log{(mn)}\log{n})$ under mild assumptions. All of the above regret bounds hold in many variants of hierarchical Bayesian linear bandit problem, including when the tasks are solved sequentially or concurrently. Furthermore, we extend the aforementioned HierTS and HierBayesUCB algorithms to the multi-task combinatorial semi-bandit setting. Concretely, our combinatorial HierTS algorithm attains comparable Bayes regret bound $O(m\sqrt{n}\log{n})$ with respect to the latest one. Moreover, our combinatorial HierBayesUCB yields a sharper Bayes regret bound $O(m\log{(mn)}\log{n})$. Experiments are conducted to validate the soundness of our theoretical results for multi-task bandit algorithms.

artificial intelligence, data mining, machine learning, (10 more...)

Neural Information Processing Systems

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Bayes Regret Bounds for Multi-Task Hierarchical Bayesian Bandit Algorithms Jiechao Guan 1 Hui Xiong 1,2, 1 AI Thrust, The Hong Kong University of Science and Technology (Guangzhou), China

Neural Information Processing SystemsFeb-16-2026, 08:13:58 GMT

Hierarchical Bayesian bandit refers to the multi-task bandit problem in which bandit tasks are assumed to be drawn from the same distribution.

bayes regret, data mining, machine learning, (20 more...)

Neural Information Processing Systems

Country:

Asia > China > Guangdong Province > Guangzhou (0.40)
Asia > China > Hong Kong (0.40)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > Experimental Study (0.92)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Optimal Scalarizations for Sublinear Hypervolume Regret

Neural Information Processing SystemsFeb-12-2026, 05:53:04 GMT

To address this, some works have proposed piecewise linear scalar-izations inspired by economics [Busa-Fekete et al., 2017], while for multi-armed bandits, scalarized

data mining, machine learning, scalarization, (19 more...)

Neural Information Processing Systems

Country: